458 research outputs found

    Stubby: A Transformation-based Optimizer for MapReduce Workflows

    Full text link
    There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces--ranging from program-based to query-based interfaces--for generating MapReduce workflows. Studies have shown that the gap in performance can be quite large between optimized and unoptimized workflows. However, automatic cost-based optimization of MapReduce workflows remains a challenge due to the multitude of interfaces, large size of the execution plan space, and the frequent unavailability of all types of information needed for optimization. We introduce a comprehensive plan space for MapReduce workflows generated by popular workflow generators. We then propose Stubby, a cost-based optimizer that searches selectively through the subspace of the full plan space that can be enumerated correctly and costed based on the information available in any given setting. Stubby enumerates the plan space based on plan-to-plan transformations and an efficient search algorithm. Stubby is designed to be extensible to new interfaces and new types of optimizations, which is a desirable feature given how rapidly MapReduce systems are evolving. Stubby's efficiency and effectiveness have been evaluated using representative workflows from many domains.Comment: VLDB201

    Latent Emission-Augmented Perspective-Taking (LEAPT) for Human-Robot Interaction

    Full text link
    Perspective-taking is the ability to perceive or understand a situation or concept from another individual's point of view, and is crucial in daily human interactions. Enabling robots to perform perspective-taking remains an unsolved problem; existing approaches that use deterministic or handcrafted methods are unable to accurately account for uncertainty in partially-observable settings. This work proposes to address this limitation via a deep world model that enables a robot to perform both perception and conceptual perspective taking, i.e., the robot is able to infer what a human sees and believes. The key innovation is a decomposed multi-modal latent state space model able to generate and augment fictitious observations/emissions. Optimizing the ELBO that arises from this probabilistic graphical model enables the learning of uncertainty in latent space, which facilitates uncertainty estimation from high-dimensional observations. We tasked our model to predict human observations and beliefs on three partially-observable HRI tasks. Experiments show that our method significantly outperforms existing baselines and is able to infer visual observations available to other agent and their internal beliefs

    Remote sensing of Earth terrain

    Get PDF
    Remote sensing of earth terrain is examined. The layered random medium model is used to investigate the fully polarimetric scattering of electromagnetic waves from vegetation. The model is used to interpret the measured data for vegetation fields such as rice, wheat, or soybean over water or soil. Accurate calibration of polarimetric radar systems is essential for the polarimetric remote sensing of earth terrain. A polarimetric calibration algorithm using three arbitrary in-scene reflectors is developed. In the interpretation of active and passive microwave remote sensing data from the earth terrain, the random medium model was shown to be quite successful. A multivariate K-distribution is proposed to model the statistics of fully polarimetric radar returns from earth terrain. In the terrain cover classification using the synthetic aperture radar (SAR) images, the applications of the K-distribution model will provide better performance than the conventional Gaussian classifiers. The layered random medium model is used to study the polarimetric response of sea ice. Supervised and unsupervised classification procedures are also developed and applied to synthetic aperture radar polarimetric images in order to identify their various earth terrain components for more than two classes. These classification procedures were applied to San Francisco Bay and Traverse City SAR images

    Workload Management for Data-Intensive Services

    Get PDF
    <p>Data-intensive web services are typically composed of three tiers: i) a display tier that interacts with users and serves rich content to them, ii) a storage tier that stores the user-generated or machine-generated data used to create this content, and iii) an analytics tier that runs data analysis tasks in order to create and optimize new content. Each tier has different workloads and requirements that result in a diverse set of systems being used in modern data-intensive web services.</p><p>Servers are provisioned dynamically in the display tier to ensure that interactive client requests are served as per the latency and throughput requirements. The challenge is not only deciding automatically how many servers to provision but also when to provision them, while ensuring stable system performance and high resource utilization. To address these challenges, we have developed a new control policy for provisioning resources dynamically in coarse-grained units (e.g., adding or removing servers or virtual machines in cloud platforms). Our new policy, called proportional thresholding, converts a user-specified performance target value into a target range in order to account for the relative effect of provisioning a server on the overall workload performance.</p><p>The storage tier is similar to the display tier in some respects, but poses the additional challenge of needing redistribution of stored data when new storage nodes are added or removed. Thus, there will be some delay before the effects of changing a resource allocation will appear. Moreover, redistributing data can cause some interference to the current workload because it uses resources that can otherwise be used for processing requests. We have developed a system, called Elastore, that addresses the new challenges found in the storage tier. Elastore not only coordinates resource allocation and data redistribution to preserve stability during dynamic resource provisioning, but it also finds the best tradeoff between workload interference and data redistribution time.</p><p>The workload in the analytics tier consists of data-parallel workflows that can either be run in a batch fashion or continuously as new data becomes available. Each workflow is composed of smaller units that have producer-consumer relationships based on data. These workflows are often generated from declarative specifications in languages like SQL, so there is a need for a cost-based optimizer that can generate an efficient execution plan for a given workflow. There are a number of challenges when building a cost-based optimizer for data-parallel workflows, which includes characterizing the large execution plan space, developing cost models to estimate the execution costs, and efficiently searching for the best execution plan. We have built two cost-based optimizers: Stubby for batch data-parallel workflows running on MapReduce systems, and Cyclops for continuous data-parallel workflows where the choice of execution system is made a part of the execution plan space.</p><p>We have conducted a comprehensive evaluation that shows the effectiveness of each tier's automated workload management solution.</p>Dissertatio

    Dynamical generation of fuzzy extra dimensions, dimensional reduction and symmetry breaking

    Get PDF
    We present a renormalizable 4-dimensional SU(N) gauge theory with a suitable multiplet of scalar fields, which dynamically develops extra dimensions in the form of a fuzzy sphere S^2. We explicitly find the tower of massive Kaluza-Klein modes consistent with an interpretation as gauge theory on M^4 x S^2, the scalars being interpreted as gauge fields on S^2. The gauge group is broken dynamically, and the low-energy content of the model is determined. Depending on the parameters of the model the low-energy gauge group can be SU(n), or broken further to SU(n_1) x SU(n_2) x U(1), with mass scale determined by the size of the extra dimension.Comment: 27 pages. V2: discussion and references added, published versio

    A low cost solution to authentication in passive RFID systems

    Get PDF
    Auto-ID Lab University of Adelaide (c) 2006 Copyright. The document attached has been archived with permission.This paper aims to propose a solution to address the issue of authentication to prevent counterfeiting in a low cost RFID based system based on using Physically Uncloneable Functions.Damith C. Ranasinghe, Daihyun Lim, Peter H. Cole and Srinivas Devada

    Spectral Characterization of Analog Samples in Anticipation of OSIRIS-REx\u27s Arrival at Bennu

    Get PDF
    NASA\u27s Origins, Spectral Interpretation, Resource Identification, and Security-Regolith Explorer hide (OSIRIS-REx) mission successfully launched on September 8th, 2016. During its rendezvous with near-Earth asteroid (101955) Bennu beginning in 2018, OSIRIS-REx will characterize the asteroid\u27s physical, mineralogical, and chemical properties in an effort to globally map the properties of Bennu, a primitive carbonaceous asteroid, and choose a sampling location]. In preparation for these observations, analog samples were spectrally characterized across visible, near- and thermal-infrared wavelengths and were used in initial tests on mineral-phase-detection and abundance-determination software algorithms
    corecore